{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Creating A New MLRun Project\n",
    "  --------------------------------------------------------------------\n",
    "\n",
    "creating a full project with multiple functions and workflow and working wit Git.\n",
    "\n",
    "#### **notebook how-to's**\n",
    "* Add local or library/remote functions\n",
    "* Add a workflow\n",
    "* Save to a remote git\n",
    "* Run pipeline"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<a id='top'></a>\n",
    "#### **steps**\n",
    "**[Add functions](#load-functions)**<br>\n",
    "**[Create and save a workflow](#create-workflow)**<br>\n",
    "**[Update remote git](#git-remote)**<br>\n",
    "**[Run a pipeline workflow](#run-pipeline)**<br>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "from mlrun import new_project, code_to_function"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [],
   "source": [
    "# update the dir and repo to reflect real locations\n",
    "# the remote git repo must be initialized in GitHub\n",
    "project_dir = \"/User/new-proj\"\n",
    "remote_git = \"https://github.com/<my-org>/<my-repo>.git\"\n",
    "newproj = new_project(\"new-project\", project_dir, init_git=True)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Set the remote git repo and pull to sync in case it has some content"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "newproj.create_remote(remote_git)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [],
   "source": [
    "newproj.pull()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<a id='load-functions'></a>\n",
    "### Load functions from remote URLs or marketplace\n",
    "We create two functions:\n",
    "1. Load a function from the function market (converted into a function object)\n",
    "2. Create a function from file in the context dir (w copy a demo file into the dir) "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "function: load-dataset\n",
      "load a toy dataset from scikit-learn\n",
      "default handler: load_dataset\n",
      "entry points:\n",
      "  load_dataset: Loads a scikit-learn toy dataset for classification or regression\n",
      "\n",
      "The following datasets are available ('name' : desription):\n",
      "\n",
      "    'boston'          : boston house-prices dataset (regression)\n",
      "    'iris'            : iris dataset (classification)\n",
      "    'diabetes'        : diabetes dataset (regression)\n",
      "    'digits'          : digits dataset (classification)\n",
      "    'linnerud'        : linnerud dataset (multivariate regression)\n",
      "    'wine'            : wine dataset (classification)\n",
      "    'breast_cancer'   : breast cancer wisconsin dataset (classification)\n",
      "\n",
      "The scikit-learn functions return a data bunch including the following items:\n",
      "- data              the features matrix\n",
      "- target            the ground truth labels\n",
      "- DESCR             a description of the dataset\n",
      "- feature_names     header for data\n",
      "\n",
      "The features (and their names) are stored with the target labels in a DataFrame.\n",
      "\n",
      "For further details see https://scikit-learn.org/stable/datasets/index.html#toy-datasets\n",
      "    context(MLClientCtx)  - function execution context\n",
      "    dataset(str)  - name of the dataset to load\n",
      "    name(str)  - artifact name (defaults to dataset)\n",
      "    file_ext(str)  - output file_ext: parquet or csv, default=parquet\n",
      "    params(dict)  - params of the sklearn load_data method\n"
     ]
    }
   ],
   "source": [
    "newproj.set_function(\"hub://load_dataset\", \"ingest\").doc()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Create a local function (use code from mlrun examples)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current\n",
      "                                 Dload  Upload   Total   Spent    Left  Speed\n",
      "100   617  100   617    0     0   3762      0 --:--:-- --:--:-- --:--:--  3739\n"
     ]
    }
   ],
   "source": [
    "!curl -o {project_dir}/handler.py https://raw.githubusercontent.com/mlrun/mlrun/master/examples/handler.py"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "kind: job\n",
      "metadata:\n",
      "  name: tstfunc\n",
      "  tag: ''\n",
      "  project: new-project\n",
      "  categories: []\n",
      "spec:\n",
      "  command: ''\n",
      "  args: []\n",
      "  volumes: []\n",
      "  volume_mounts: []\n",
      "  env: []\n",
      "  default_handler: ''\n",
      "  description: ''\n",
      "  build:\n",
      "    functionSourceCode: CmRlZiBteV9mdW5jKGNvbnRleHQsIHAxOiBpbnQgPSAxLCBwMj0nYS1zdHJpbmcnKToKICAgICIiInRoaXMgaXMgYSB0d28gcGFyYW0gZnVuY3Rpb24KCiAgICA6cGFyYW0gcDEgIGZpcnN0IHBhcmFtCiAgICA6cGFyYW0gcDIgIDJuZCBwYXJhbQogICAgIiIiCiAgICAjIGFjY2VzcyBpbnB1dCBtZXRhZGF0YSwgdmFsdWVzLCBmaWxlcywgYW5kIHNlY3JldHMgKHBhc3N3b3JkcykKICAgIHByaW50KCdSdW46IHt9ICh1aWQ9e30pJy5mb3JtYXQoY29udGV4dC5uYW1lLCBjb250ZXh0LnVpZCkpCiAgICBwcmludCgnUGFyYW1zOiBwMT17fSwgcDI9e30nLmZvcm1hdChwMSwgcDIpKQogICAgY29udGV4dC5sb2dnZXIuaW5mbygncnVubmluZyBmdW5jdGlvbicpCgogICAgIyBSVU4gc29tZSB1c2VmdWwgY29kZSBlLmcuIE1MIHRyYWluaW5nLCBkYXRhIHByZXAsIGV0Yy4KCiAgICAjIGxvZyBzY2FsYXIgcmVzdWx0IHZhbHVlcyAoam9iIHJlc3VsdCBtZXRyaWNzKQogICAgY29udGV4dC5sb2dfcmVzdWx0KCdhY2N1cmFjeScsIHAxICogMikKICAgIGNvbnRleHQubG9nX3Jlc3VsdCgnbG9zcycsIHAxICogMykKICAgIGNvbnRleHQuc2V0X2xhYmVsKCdmcmFtZXdvcmsnLCAnc2tsZWFybicpCgo=\n",
      "    base_image: mlrun/mlrun\n",
      "    commands:\n",
      "    - pip install pandas\n",
      "\n"
     ]
    }
   ],
   "source": [
    "# add function with build config (base image, run command)\n",
    "fn = code_to_function(\"tstfunc\", filename=\"handler.py\", kind=\"job\")\n",
    "fn.build_config(base_image=\"mlrun/mlrun\", commands=[\"pip install pandas\"])\n",
    "newproj.set_function(fn)\n",
    "print(newproj.func(\"tstfunc\").to_yaml())"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<a id='create-workflow'></a>\n",
    "### Create a workflow file and store it in the context dir"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Overwriting /User/new-proj/workflow.py\n"
     ]
    }
   ],
   "source": [
    "%%writefile {project_dir}/workflow.py\n",
    "from kfp import dsl\n",
    "from mlrun import mount_v3io\n",
    "\n",
    "funcs = {}\n",
    "\n",
    "def init_functions(functions: dict, project=None, secrets=None):\n",
    "    functions['ingest'].apply(mount_v3io())\n",
    "\n",
    "@dsl.pipeline(\n",
    "    name='demo project', description='Shows how to use mlrun project.'\n",
    ")\n",
    "def kfpipeline(p1=3):\n",
    "    # first step build the function container\n",
    "    builder = funcs['tstfunc'].deploy_step(with_mlrun=False)\n",
    "    \n",
    "    ingest = funcs['ingest'].as_step(name='load-data', params={'dataset': 'boston'})\n",
    "\n",
    "    # first step\n",
    "    s1 = funcs['tstfunc'].as_step(name='step-one', handler='my_func',\n",
    "         image=builder.outputs['image'],\n",
    "         params={'p1': p1})\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [],
   "source": [
    "newproj.set_workflow(\"main\", \"workflow.py\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "name: new-project\n",
      "functions:\n",
      "- url: hub://load_dataset\n",
      "  name: ingest\n",
      "- name: tstfunc\n",
      "  spec:\n",
      "    kind: job\n",
      "    metadata:\n",
      "      name: tstfunc\n",
      "      tag: ''\n",
      "      project: new-project\n",
      "      categories: []\n",
      "    spec:\n",
      "      command: ''\n",
      "      args: []\n",
      "      env: []\n",
      "      default_handler: ''\n",
      "      description: ''\n",
      "      build:\n",
      "        functionSourceCode: CmRlZiBteV9mdW5jKGNvbnRleHQsIHAxOiBpbnQgPSAxLCBwMj0nYS1zdHJpbmcnKToKICAgICIiInRoaXMgaXMgYSB0d28gcGFyYW0gZnVuY3Rpb24KCiAgICA6cGFyYW0gcDEgIGZpcnN0IHBhcmFtCiAgICA6cGFyYW0gcDIgIDJuZCBwYXJhbQogICAgIiIiCiAgICAjIGFjY2VzcyBpbnB1dCBtZXRhZGF0YSwgdmFsdWVzLCBmaWxlcywgYW5kIHNlY3JldHMgKHBhc3N3b3JkcykKICAgIHByaW50KCdSdW46IHt9ICh1aWQ9e30pJy5mb3JtYXQoY29udGV4dC5uYW1lLCBjb250ZXh0LnVpZCkpCiAgICBwcmludCgnUGFyYW1zOiBwMT17fSwgcDI9e30nLmZvcm1hdChwMSwgcDIpKQogICAgY29udGV4dC5sb2dnZXIuaW5mbygncnVubmluZyBmdW5jdGlvbicpCgogICAgIyBSVU4gc29tZSB1c2VmdWwgY29kZSBlLmcuIE1MIHRyYWluaW5nLCBkYXRhIHByZXAsIGV0Yy4KCiAgICAjIGxvZyBzY2FsYXIgcmVzdWx0IHZhbHVlcyAoam9iIHJlc3VsdCBtZXRyaWNzKQogICAgY29udGV4dC5sb2dfcmVzdWx0KCdhY2N1cmFjeScsIHAxICogMikKICAgIGNvbnRleHQubG9nX3Jlc3VsdCgnbG9zcycsIHAxICogMykKICAgIGNvbnRleHQuc2V0X2xhYmVsKCdmcmFtZXdvcmsnLCAnc2tsZWFybicpCgo=\n",
      "        base_image: mlrun/mlrun\n",
      "        commands:\n",
      "        - pip install pandas\n",
      "workflows:\n",
      "- name: main\n",
      "  path: workflow.py\n",
      "artifacts: []\n",
      "\n"
     ]
    }
   ],
   "source": [
    "print(newproj.to_yaml())"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<a id='git-remote'></a>\n",
    "### Register and push the project to a remote Repo"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "newproj.push(\"master\", \"first push\", add=[\"handler.py\", \"workflow.py\"])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 18,
   "metadata": {},
   "outputs": [],
   "source": [
    "newproj.source"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<a id='run-pipeline'></a>\n",
    "### Run the workflow"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[mlrun] 2020-03-30 20:26:21,292 WARNING!, you seem to have uncommitted git changes, use .push()\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "/conda/lib/python3.6/site-packages/kfp/components/_data_passing.py:168: UserWarning: Missing type name was inferred as \"Integer\" based on the value \"3\".\n",
      "  warnings.warn('Missing type name was inferred as \"{}\" based on the value \"{}\".'.format(type_name, str(value)))\n"
     ]
    },
    {
     "data": {
      "text/html": [
       "Experiment link <a href=\"https://dashboard.default-tenant.app.yh48.iguazio-cd2.com/pipelines/#/experiments/details/316345ca-f728-41f3-8679-49160af5c92b\" target=\"_blank\" >here</a>"
      ],
      "text/plain": [
       "<IPython.core.display.HTML object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/html": [
       "Run link <a href=\"https://dashboard.default-tenant.app.yh48.iguazio-cd2.com/pipelines/#/runs/details/526642b2-c595-421c-b81c-c51d7f47fb98\" target=\"_blank\" >here</a>"
      ],
      "text/plain": [
       "<IPython.core.display.HTML object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[mlrun] 2020-03-30 20:26:21,411 Pipeline run id=526642b2-c595-421c-b81c-c51d7f47fb98, check UI or DB for progress\n"
     ]
    },
    {
     "data": {
      "text/plain": [
       "'526642b2-c595-421c-b81c-c51d7f47fb98'"
      ]
     },
     "execution_count": 9,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "newproj.run(\n",
    "    \"main\",\n",
    "    arguments={},\n",
    "    artifact_path=\"v3io:///users/admin/mlrun/kfp/{{workflow.uid}}/\",\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**[back to top](#top)**"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.6.8"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}
